Web Based Maltese Language Text to Speech Synthesiser
ثبت نشده
چکیده
An important factor which has led to the growth of the internet is the ease of professional website development. A generic platform was required for the successful implementation of a number of applications so that they become available on the internet as easily as possible. The aim of this paper is to identify methods by which one can setup a web based interactive system on which various website applications can be developed for different tasks. Such system should lead to the development of websites which are aimed to be low cost, robust, secure, support multi-lingual content, aesthetically professional, fast to develop and easy to maintain at the same time. The flexibility of the chosen system was examined by implementing on it a handler to achieve the first web based Maltese language Text to Speech (TTS) software. The results obtained clearly show that open source content management systems offer a free platform on which high quality and secure websites can be built with limited knowledge of web development. The flexibility of such systems makes them ideal for various application developments. In this case, the first web based Maltese language TTS system was successfully implemented. 1. Content Management Systems A Content Management System (CMS) is a web based software that helps one to develop and manage the content of a website quite easily. It is a tool that enables a variety of technical and non technical staff to create, edit, manage and finally publish in a number of formats a variety of content (such as text, graphics, video, documents etc), whilst being constrained by a centralised set of rules, process and workflows that ensure coherent, validated electronic content [1]. This means that moving ahead from the traditional HTML standalone web pages, in a CMS driven website a user would have the ability to update a page much faster and simpler by using an online editor. The primary disadvantage of deploying a CMS is the added complexity in the website system and framework. 2. CMS Deployment The option of developing a CMS from scratch was discarded as it is too costly to develop something which already exists within various packages and operating systems. Off the shelf, there are a lot of CMS software to choose from. However there are a number of steps which one must take to ensure the best performance to cost ratio is achieved. Given that various applications were required to be house developed within the CMS framework; the CMS needed to be editable as much as possible. Thus the choice turned out to be automatically on choosing some Linux flavored open source CMS. In a user driven community, the activity of the community which strives to keep the CMS secure to the latest vulnerabilities. By examining such activity and keeping in mind the factors mentioned above, the list was brought down to five packages: CMS Made Simple, Drupal, Joomla, Wordpress and Xoops. In this regard, a study was carried out outlining how these five open source CMS software rate with the various characteristics required for our application. Though Drupal and Joomla both have very similar features, in the end Joomla was chosen due to the extensive extensions repository which caters for all the major web applications. Besides this, the ease by which one can develop his extensions made Joomla the number one choice. Other factors included the platform’s compatibility with multi-lingual content and the ease of maintenance. 3. Web Server Selection Joomla was developed and tested primarily on an Apache web server. In the next section it is shown that an already developed TTS executable software requires a Windows server. Thus a solution was required through which two different operating systems would run simultaneously on the same web server. Also, both systems must be able to exchange data in real time amongst each other. The final choice made was in favour of deploying a virtual Apache system over a Windows operating system. Thus a Windows, Apache, MySQL and PHP (WAMP) system was used. As the name suggests, WAMP is a Windows program which implements a virtual Apache web server together with the MySQL database structure and the PHP server side scripting language on top of a Windows operating system. Figure 3.1 – System block diagram EasyPHP version 2 is a WAMP software bundle which contains Apache 2.2.3, PHP 5.2.0, MySQL 5.0.27, phpMyAdmin 2.9.1.1 and SQLiteManager 1.2.0. Such a solution gives us an ideal platform on which to install our CMS, as well as a running platform for the ANSI C TTS software. 4. The Maltese TTS Synthesiser Speech synthesis is the artificial production of human speech who’s quality is judged by its similarity to the human voice and its ability to be understood. On the other hand, a TTS synthesiser is a computer software that converts text into audible speech. P. Micallef [1] has suggested and implemented the first TTS synthesis system for the Maltese language. This TTS software is a command line driven package written in ANSI C. It makes use of several procedures that are compiled to object code, and afterwards linked together to form an executable program. There is also an external B-tree based database that is linked to the program and used to make a direct translation from grapheme to phoneme without using the rules. Initially there are procedures to consider the locale by translating numbers, abbreviations, etc. into words. A set of rules then translate the text to phonetic words and add also the main stressed syllable. This is divided into two procedures. The next set takes into account adjacent words and readjusts phonetic content. The phonetic content is then input into another procedure that translates the phonetic content to diphones. This procedure makes use of the stress and length indicators to choose the appropriate diphones for vowels and for double consonants. The diphone sets are then passed to another procedure that obtains from the binary diphone database information relating to pitch, pitch positions etc on each diphone, and prepares an overall file based on pitch synchronous techniques. Finally this file is used in the last procedure to play the audio. Each set is essentially independent, allowing for any further development in the areas of intonation and change of diphone databases. 5. Maltese Character System Internally the TTS system developed in [1] works with ASCII characterization. Thus if a user would like to input the character ż, the Windows equivalent code is to be typed while holding the ALT key pressed down (i.e. ALT + 167 in this case). For cross system compatibility matters, when it comes to the Maltese characters the software uses constants defined in a header file other than direct ASCII characters. Thus, for example the constant ZCAP is defined to be equal to the 247 ASCII which is mapped by the section sign (§) character. The Unicode equivalent of the § character is U+00A7 while its HTML equivalent is §. Table 5.1 shows the equivalent values for all the Maltese language characters as used in the Maltese TTS software. Table 5.1 – Maltese character equivalents Joomla 1.5 offers an ideal platform for developing multi-lingual content. The standard language library of the major languages can be downloaded and easily installed from the Joomla portal. However, since the Maltese language pack for this CMS wasn’t existent by the time of the publication of this paper, it had to be developed from scratch. 6. Web Application Development 6.1. Methodology Study There are two different options which were considered for the successfully integration of the existent Maltese TTS system within a CMS web based framework. The first option was to re-code the whole Maltese TTS block, which was written in ANSI C, into the web based PHP language. The main difference that a web based application has from a standard application is that the web application needs to serve different users at the same period of time. Several web based languages such as PHP and ASP contain a large number of functions to cater for this problem. On the other hand, a standard imperative computer language such as ANSI C doesn’t cater for such situations. However, the major problem with implementing this option was that the TTS block syntax (which also includes a B-tree) is quite complex. The second option was to develop a PHP handler which could transfer data from the CMS’s web based interface to the web server on which a modified version of the TTS block written in [1] would execute on. There were two major concerns with adopting this methodology. The first concern being that the web server will need to cater for both a Linux environment for the CMS, as well as a Windows environment for the TTS block. However as it has been clearly illustrated in section 3, this problem can be mitigated by the use of a WAMP system such as EasyPHP. The second concern was that the present TTS block catered only for a single user at any point in time. This problem could be minimized by editing the input and output parts of the ANSI C TTS block and integrating it with a PHP web based interface. Finally, the second option was chosen as the first one involved quite a complex process which was beyond the purpose of this paper. 6.2. TTS Block Implementation Although the main TTS block was left intact; the original C software was heavily modified in its output and input stages. At the input stage the original software gave several text input options varying from a single word, a sentence, a text file and a batch file. The main menu was bypassed by making the program execute directly to menu option 3. This menu option takes a phrase text input either from the keyboard or from a text file and processes it. The keyboard input method was discarded and thus a new function which automatically accepts and processes inputs from the text file file.txt was written. Besides that, the program also reads an 8 digit code from the id.txt text file. The primary idea behind this ID file is that the program will be able to distinguish between different sessions whilst operating on a multiuser environment such as the WWW. The generation process of the ID code will be tackled in 6.4. At the output stage, other generating all the WAV files as stest.wav and playing them on the Windows Media Player, the generated WAV files are stored in relations to their corresponding ID name. Following the generation of the WAV file, the program will now be thrown into a loop which constantly reads the ID code from id.txt text file. If a user has inputted a new phrase, the contents of id.txt and file.txt will be updated accordingly. Thus, if the program notes that the ID code has changed, it will reexecute the above algorithm so as to generate a new WAV file corresponding to the new input string. 6.3. Web Handler Development The primary aim of the PHP web handler is to provide a web interface for the TTS system. Besides this, it also is set to take care of session control. The handler can be split into two main parts: the application form on which the user would write the phrase he would like to listen to and submit it, and the execution control which corresponds with the TTS block in the server. The input form makes use of simple HTML structures and the data is sent using the POST method. For accessibility purposes, four buttons which correspond to the special Maltese characters were added on the form to cater for those users who lack a Maltese keyboard. Of course if the user would enter the letter g instead of a ġ, the outputted pronunciation wouldn’t be correct. On pressing the submit button, the text stream is edited so that it would be made compatible with the TTS block. As it has been explained earlier, the TTS block doesn’t accept UTF-8 characterization. Thus for example the character ż needs to be replaced with the § character before it is inputted to the TTS block. Apart from this, the TTS block specifies that a valid sentence input should start with a space and terminate with the & character. Therefore the below PHP code was added to change the inputted Maltese characters with their TTS block’s pre-set representation. $sr0 = " " .$source ." &"; $sr1 = str_replace('Ŝ','§',$sr0); $sr2 = str_replace('ā','¥',$sr1); $sr3 = str_replace('ħ','¦',$sr2); $sr4 = str_replace('ë',' ̈',$sr3); For example if the input is: Proāett ta' l-Aħħar Sena għall-Kors fl-Ināinerija Before being inputted to the TTS block it is converted into: Pro¥ett ta' l-A¦¦ar Sena g¦all-Kors fl-In¥inerija & Finally, the PHP handler saves the edited text stream in the webserver’s file.txt and the session ID number in the file id.txt. Figure 6.1 – Web handler 6.4. Session Control An important parameter which one must always take into consideration when working within a web environment is the issue of multiple inputs by different users at the same time. Imagine that if at a certain point in time a user inputs a sentence X and after a very small period of time, another user inputs a sentence Y. Such operations will produce garbage data given that all the parameters which were being used for sentence X, at a certain point are implemented on sentence Y. An HTTP session token is a unique identifier (usually in the form of a hash generated by a hash function) that is generated and sent from a server to a client to identify the current interaction session. The client usually stores and sends the token as an HTTP cookie and sends it as a parameter in GET or POST queries. The session_id function in PHP allocates a unique key to every user which logs in the website. This key remains the same until the user disconnects from the internet. As the session key is very long, the PHP handler takes only 5 digits from the session ID and appends to it a two digit random number. The random number is appended so that different inputs from the same user will be distinguishable. Inorder to make sure that the new key doesn’t start with a digit, the letter a is appended at the start of the new key. These alterations are carried out so that there wouldn’t be any incompatibility with the TTS block’s C code. Finally this number is stored in the text file id.txt so that its value will be used by the TTS block. The below PHP code illustrate how the session ID key was generated: $nid = substr($id,5,5); $rnd = rand (10,99); $newid = "a" .$nid .$rnd; $wavid = $newid .".wav"; $printid = $newid ."&"; Session ID: i957mg6lj9kjp4rrbrn5tna4t7 Extracted ID: g6lj9 Random ID: ag6lj965 Wav ID: ag6lj965.wav Print ID: ag6lj965& (ID written in file id.txt) Thus, with the session ID method it is very clear that it would be very difficult for the system to mix the input text within web sessions. 6.5. System Deployment on CMS After the PHP handler code together with the TTS block were tested within a standalone HTML website, the system required to be implemented on the CMS framework. New web applications are generally deployed within the Joomla CMS framework by means of the development of a new extension. In our case, this new extension would have contained the PHP + HTML source code together with a number of Joomla command lines for compatibility matters. However after browsing through the extensive Joomla extensions directory the ChronoForms component was noted. This component allows the administrator to simply paste the PHP + HTML code of a particular form and it implements the code within the Joomla component framework automatically. Given that the standalone PHP code worked perfectly this system was adopted. 7. System Limitations The TTS block developed in [1] was identified as the major component that is limiting the system. Of course one cannot ignore the fact that it is also the most complex part of the system. Here the main issue is that when the block is given certain phrases, the executable program halts. In such circumstances, the program has to be manually restarted by the system administrator. During the testing procedures there were several cases which were noted to cause this problem. These cases were studied and a possible solution was drafted for each one of them. (a) Long phrase case: The program was noted to halt when the phrase input was longer than 2000 characters. In this situation the limiting factor is that the sentence string in the ANSI C software is set to take a maximum of 2000 characters. To cater for this problem, the web input side was used to truncate any input string which is longer than 2000 characters by means of a PHP function. (b) Too long word case: It was noted that the program halts when one of the input words would be longer than 15 characters. Here again this problem was correcting at the web input level by using a PHP functions which inserts a space in any word that is longer than 15 characters. (c) Ambiguous input case: It was noted that on an ambiguous input the program halts. An ambiguous input could simply mean either an English word input, an incorrectly spelt Maltese word or any other word which the TTS block doesn’t manage to read. Since for the purpose of this paper, the TTS generation section was considered as a black box, this problem was not catered for. 8. Suggestions for Future Work Based on the knowledge which was acquired during the development of this system, it is possible to outline areas where future work on the Maltese TTS system may be carried out. These suggestions are aimed at transforming the current prototype implementation into a system mature enough for extended testing, enabling verification of the technology in the field. As regards the TTS block, the ANSI C code should be debugged more rigorously so that whenever an ambiguous input is given, the system doesn’t halt but simply gives a warning and afterwards it is restarted. Keeping in mind that the TTS block was written in 1997, a possible but very demanding suggestion might be to re-code the TTS program in an Object Oriented Programming (OOP) language such as Java, C# or PHP 5. A very ambitious project would be to integrate the Maltese TTS system within the Microsoft SAPI (Speech Application Programming Interface) API environment. SAPI is a framework which standardises TTS of different languages within a Windows environment [2].
منابع مشابه
Using same-language machine translation to create alternative target sequences for text-to-speech synthesis
Modern speech synthesis systems attempt to produce speech utterances from an open domain of words. In some situations, the synthesiser will not have the appropriate units to pronounce some words or phrases accurately but it still must attempt to pronounce them. This paper presents a hybrid machine translation and unit selection speech synthesis system. The machine translation system was trained...
متن کاملTheory , Methodology and Implementation of the Malay Text - to - Speech System
This paper deals with the improvements and modifications done on the first edition of the Malay Language Text-toSpeech System, SUM (acronym for Sintesis Ucapan Melayu). A simple review on human speech production system and synthetic speech production system is discussed. Modifications on the voicing source include an additional KLGLOTT88 model added into the software. Theory and methodology on ...
متن کاملOn evaluating synthesised visual speech
This paper describes issues relating to the subjective evaluation of synthesised visual speech. Two approaches to synthesis are compared: a text-driven synthesiser and a speech-driven synthesiser. Both synthesisers are trained using the same data and both use the same model for rendering the synthesised visual speech. Naturalness is used as a performance metric, and the naturalness of real visu...
متن کاملAutomatic phonetisation for Icelandic
As a part of my final thesis in language technology, I created a speech synthesiser using the free MBROLA system. MBROLA is a project designed to make speech synthesisers for as many languages as possible available for free. It does not require a lot of technological prowess for the general user to create such a synthesiser: all that is required is segmented speech data, and the rest is handled...
متن کاملAssigning Prosodic Structure for Speech Synthesis: a Rule-Based Approach
This paper presents a model that assigns prosodic structure to unrestricted text. The model is linguistically motivated and also uses constraints on phrase length. For the implementation an XML-pipeline is used as a data-architecture. The output can be processed by a text-to-speech synthesiser for determining the locations of phrase breaks. The model outperforms another rule-based approach, and...
متن کامل